Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents
نویسندگان
چکیده
Cross-lingual voice conversion (XVC) transforms the speaker identity of a source to that target who speaks different language. Due intrinsic differences between languages, converted speech may carry an unwanted foreign accent. In this paper, we first investigate intelligibility and confirm performance degradation caused by accent/intelligibility issue. With goal generating native-sounding speech, paper further proposes novel training scheme with two additional linguistic losses for waveform generation: 1) frame-wise phonetic content loss derived from bottleneck features, 2) automatic recognition on characters. Experiments were conducted English Mandarin Chinese conversions. The experimental results confirmed generated sounds more natural proposed solution significantly improves intelligibility.
منابع مشابه
Cross - Lingual Voice Conversion
CROSS-LINGUAL VOICE CONVERSION Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conve...
متن کاملFrame alignment method for cross-lingual voice conversion
Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult ...
متن کاملArticulatory-based conversion of foreign accents with deep neural networks
We present an articulatory-based method for real-time accent conversion using deep neural networks (DNN). The approach consists of two steps. First, we train a DNN articulatory synthesizer for the non-native speaker that estimates acoustics from contextualized articulatory gestures. Then we drive the DNN with articulatory gestures from a reference native speaker –mapped to the nonnative articul...
متن کاملSpectral Mapping Using Artificial Neural Networks for Intra-lingual and Cross-lingual Voice Conversion
CERTIFICATE This is to certify that the work contained in this thesis titled Spectral mapping using have not been submitted to any other Institute or University for the award of any degree or diploma. Date Mr. Kishore Prahallad ii ACKNOWLEDGEMENTS I would like to express my deepest appreciation to Kishore Prahallad, my advisor for his guidance, encouragement and support throughout my duration a...
متن کاملAdding Glottal Source Information to Intra-Lingual Voice Conversion
This paper studies the inclusion of glottal source characteristics in voice conversion (VC) systems. We use source/filter decomposition to parametrize the vocal tract using LSF, the glottal source using the LF model, and the aspiration noise using amplitude-modulated high-pass filtered AWGN noise. To evaluate the impact of this new parametrization in VC, we use a reference conversion system tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2023
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2023.3271107